Goto

Collaborating Authors

 Kyūshū & Okinawa


Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

Neural Information Processing Systems

Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SL T datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SL T.



AI could replace foreign workers in Japan, Team Mirai says

The Japan Times

Foreign workers in Japan became one of the main topics of all parties in the Feb. 8 Lower House election, which took place just after a Jan. 23 Cabinet decision calling for 1,231,900 foreign workers by March 2029 in 19 sectors facing acute labor shortages. While some parties argued for strictly monitoring foreign nationals or setting quotas on their numbers, especially at the local level, an artificial-intelligence engineer-led party that went into the election with no seats and emerged with 11 proportional representation seats proposed the increased use of AI to replace workers, including foreign nationals, as a solution to concerns about more foreign workers. Team Mirai, founded in May and led by Takahiro Anno, won four seats in the Tokyo block and three in the South Kanto block, along with one seat each in the Tohoku, North Kanto, Tokai, and Kyushu blocks. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right. With your current subscription plan you can comment on stories.





Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.


Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond Taiji Suzuki 1,2, Denny Wu

Neural Information Processing Systems

Langevin dynamics (MFLD) (Mei et al., 2018; Hu et al., 2019) is particularly attractive due to the MFLD arises from a noisy gradient descent update on the parameters, where Gaussian noise is injected to the gradient to encourage "exploration". Furthermore, uniform-in-time estimates of the particle discretization error have also been established (Suzuki et al., The goal of this work is to address the following question.